Access Time Tradeoffs in Archive Compression
نویسندگان
چکیده
Web archives, query and proxy logs, and so on, can all be very large and highly repetitive; and are accessed only sporadically and partially, rather than continually and holistically. This type of data is ideal for compression-based archiving, provided that random-access to small fragments of the original data can be achieved without needing to decompress everything. The recent RLZ (relative Lempel Ziv) compression approach uses a semi-static model extracted from the text to be compressed, together with a greedy factorization of the whole text encoded using static integer codes. Here we demonstrate more precisely than before the scenarios in which RLZ excels. We contrast RLZ with alternatives based on block-based adaptive methods, including approaches that “prime” the encoding for each block, and measure a range of implementation options using both hard-disk (HDD) and solid-state disk (SSD) drives. For HDD, the dominant factor affecting access speed is the compression rate achieved, even when this involves larger dictionaries and larger blocks. When the data is on SSD the same effects are present, but not as markedly, and more complex trade-offs apply.
منابع مشابه
On Structures of Inverted Index for Query Processing Efficiency
Efficiency On Structures of Inverted Index for Query Processing Efficiency . . . . . . . . . 3 Xingshen Song, Xueping Zhang, Yuexiang Yang, Jicheng Quan, and Kun Jiang Access Time Tradeoffs in Archive Compression . . . . . . . . . . . . . . . . . . . . . 15 Matthias Petri, Alistair Moffat, P.C. Nagesh, and Anthony Wirth Large Scale Sentiment Analysis with Locality Sensitive BitHash. . . . . . ....
متن کاملThe still image lossy compression standard - JPEG
Reducing image files will be an important procedure when we transmit files across networks (Wiseman, Schwan & Widener, 2004). or when we would like to archive libraries. Usually, JPEG can remove the less important data before the compression; hence JPEG will be able to compress images meaningfully, which produces a huge difference in the transmission time and the disk space. The processing time...
متن کاملPerformance Tradeoffs for Header Compression in MPLS Networks
In this paper, we propose the use of compression techniques for RTP/UDP/IP/MPLS headers in MPLS networks to enable header compression over several IP hops. We consider the transmission of low-bitrate real-time traffic and analytical results illustrate performance tradeoffs regarding network utilization by user data. Header compression reduces the gross rate of low-bitrate streams and increases ...
متن کاملExploring compression techniques for ROOT IO
ROOT provides an flexible format used throughout the HEP community. The number of use cases from an archival data format to end-stage analysis has required a number of tradeoffs to be exposed to the user. For example, a high “compression level” in the traditional DEFLATE algorithm will result in a smaller file (saving disk space) at the cost of slower decompression (costing CPU time when read)....
متن کاملRediscovery of Time Memory Tradeoffs
Some of the existing time memory tradeoff attacks (TMTO) on specific systems can be reinterpreted as methods for inverting general oneway functions. We apply these methods back to specific systems in ways not considered before. This provides the following startling results. No streamcipher can provide security equal to its key length; some important blockcipher modes of operations are vulnerabl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015